229 research outputs found
Parallel Construction of Wavelet Trees on Multicore Architectures
The wavelet tree has become a very useful data structure to efficiently
represent and query large volumes of data in many different domains, from
bioinformatics to geographic information systems. One problem with wavelet
trees is their construction time. In this paper, we introduce two algorithms
that reduce the time complexity of a wavelet tree's construction by taking
advantage of nowadays ubiquitous multicore machines.
Our first algorithm constructs all the levels of the wavelet in parallel in
time and bits of working space, where
is the size of the input sequence and is the size of the alphabet. Our
second algorithm constructs the wavelet tree in a domain-decomposition fashion,
using our first algorithm in each segment, reaching time and
bits of extra space, where is the
number of available cores. Both algorithms are practical and report good
speedup for large real datasets.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
Técnicas de indexación y recuperación de documentos utilizando referencias geográficas y textuales
[Resumen] Internet y la World Wide Web se han convertido en un enorme repositorio de información consultado diariamente por millones de usuarios. Además, otros repositorios de información, como las bases de datos documentales o las
bibliotecas digitales, también han aumentado su popularidad considerablemente.
Esto ha provocado que la recuperación de información se haya convertido en una
de las áreas de investigación más importantes dentro de la informática.
Aunque estos repositorios contienen información de distinta naturaleza, la
información más habitual es de tipo textual. A menudo, en el texto de un
documento se pueden encontrar referencias geográficas que permiten asignar a ese
documento una zona del espacio en la cual es relevante. Los
usuarios de los sistemas que enumerábamos demandan cada vez más
servicios que les permitan situar la información recuperada en un mapa.
Además, también está aumentando el interés en consultas que permitan
recuperar documentos relevantes no sólo para un tema determinado sino también
para una zona determinada. El desarrollo de arquitecturas de sistemas, estructuras de
indexación y otros componentes que permitan satisfacer estas necesidades es el
objetivo principal de una nueva área de investigación denominada recuperación
de información geográfica (GIR).
En esta tesis abordamos varios temas de interés en el área. En
primer lugar, las estructuras de indexación que permiten recuperar documentos
empleando tanto su ámbito textual como su ámbito espacial no tienen en cuenta
la naturaleza jerárquica del espacio geográfico ni las relaciones topológicas
entre los objetos espaciales que indexan. Por tanto, nuestro primer objetivo es
desarrollar una estructura que solucione los problemas debidos a estas
limitaciones. Esta estructura constituye la base de la
arquitectura para sistemas GIR que proponemos como segundo objetivo de la
tesis. Estudiamos las limitaciones de las arquitecturas de los sistemas GIR
propuestas hasta la fecha y proponemos una arquitectura genérica, modular y
extensible. Además desarrollamos un prototipo de sistema basado en dicha
arquitectura. Finalmente, como tercer objetivo de esta tesis proponemos una
estructura para indexar objetos geográficos optimizada para las
características de la información que se maneja habitualmente en sistemas GIR
When Edge Computing Meets Compact Data Structures
Edge computing enables data processing and storage closer to where the data
are created. Given the largely distributed compute environment and the
significantly dispersed data distribution, there are increasing demands of data
sharing and collaborative processing on the edge. Since data shuffling can
dominate the overall execution time of collaborative processing jobs,
considering the limited power supply and bandwidth resource in edge
environments, it is crucial and valuable to reduce the communication overhead
across edge devices. Compared with data compression, compact data structures
(CDS) seem to be more suitable in this case, for the capability of allowing
data to be queried, navigated, and manipulated directly in a compact form.
However, the relevant work about applying CDS to edge computing generally
focuses on the intuitive benefit from reduced data size, while few discussions
about the challenges are given, not to mention empirical investigations into
real-world edge use cases. This research highlights the challenges,
opportunities, and potential scenarios of CDS implementation in edge computing.
Driven by the use case of shuffling-intensive data analytics, we proposed a
three-layer architecture for CDS-aided data processing and particularly studied
the feasibility and efficiency of the CDS layer. We expect this research to
foster conjoint research efforts on CDS-aided edge data analytics and to make
wider practical impacts
Los sistemas de información geográfica en turismo
[Resumo] A internet converteuse nun dos lugares máis populares para publicar e buscar case calquera tipo de información. En particular, a información turística gañou moita atención na rede durante os últimos anos, e non só a información sobre viaxes, recursos, lugares, museos ou monumentos, senón tamén sobre turismo cultural. Neste artigo presentamos as posibilidades que ofrecen os sistemas de información xeográfica (SIX) para a publicación de información turística e o acceso a ela, a través de interfaces coa capacidade de xerar mapas interactivos que presenten información asociada a cada elemento de interese que apareza neles. Ademais, describimos como caso de estudo a viaxe virtual que se nos propón na Biblioteca Virtual Galega (http://bvg.udc.es), un sistema accesible a través da web que, por medio de tecnoloxías SIX, permite acceder a calquera información turística ou cultural de Galicia de xeito sinxelo.[Resumen] Internet se ha convertido en uno de los lugares más populares para publicar y buscar casi cualquier tipo de información. En particular, la información turística ha ganado mucha atención en la red durante los últimos años, no sólo información sobre viajes, recursos, lugares, museos o monumentos, sino también sobre turismo cultural.
En este artículo presentamos las posibilidades que ofrecen los Sistemas de Información Geográfica (SIG) en la publicación y acceso a información turística, a través de interfaces con capacidades de generación de mapas interactivos con información asociada a cada elemento
de interés presentado en los mapas.
Además, describimos como caso de estudio el Viaje Virtual de la Biblioteca Virtual Gallega (http://bvg.udc.es), un sistema accesible a través de la Web que, utilizando tecnologías SIG, permite acceder a cualquier información turística o cultural de Galicia de manera sencilla.[Abstract] The Internet has become one of the most popular places to publish and search for almost any type of information. In particular, tourist information has received much attention in the Internet over the past few years, not only information about travel, resources, places, museums or monuments, but also about cultural tourism. In this article we discuss the potential offered by Geographic Information Systems (GIS) in
the publication of and access to tourist information, through interfaces capable of generating interactive maps with information associated with each element of interest shown in the maps. In addition, as a case study, we describe the Virtual Trip of the Galician Virtual Library (http://bvg.udc.es), an Internet-accessible system which makes it possible, using GIS technologies, to easily access any tourist or cultural information about Galicia
Space-Efficient Representations of Raster Time Series
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Raster time series, a.k.a. temporal rasters, are collections of rasters covering the same region at consecutive timestamps. These data have been used in many different applications ranging from weather forecast systems to monitoring of forest degradation or soil contamination. Many different sensors are generating this type of data, which makes such analyses possible, but also challenges the technological capacity to store and retrieve the data. In this work, we propose a space-efficient representation of raster time series that is based on Compact Data Structures (CDS). Our method uses a strategy of snapshots and logs to represent the data, in which both components are represented using CDS. We study two variants of this strategy, one with regular sampling and another one based on a heuristic that determines at which timestamps should the snapshots be created to reduce the space redundancy. We perform a comprehensive experimental evaluation using real datasets. The results show that the proposed strategy is competitive in space with alternatives based on pure data compression, while providing much more efficient query times for different types of queries.The data used in this study were acquired as part of the mission of NASA’s Earth Science Division and archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC). Funding: CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014-2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01). This work was also supported by Xunta de Galicia/FEDER-UE under Grants [IG240.2020.1.185; IN852A 2018/14]; Ministerio de Ciencia, Innovación y Universidades under Grants [TIN2016-78011-C4-1-R; RTC-2017-5908-7; PID2019- 105221RB-C41/AEI/10.13039/501100011033]; ANID - Millennium Science Initiative Program - Code ICN17_002; Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo (CYTED) [Grant No. 519RT0579]Xunta de Galicia; ED431G 2019/01Xunta de Galicia; IG240.2020.1.185Xunta de Galicia; IN852A 2018/14Chile. Agencia Nacional de Investigación y Desarrollo; ICN17_00
Privacy-enhancing distributed protocol for data aggregation based on blockchain and homomorphic encryption
The recent increase in reported incidents of security breaches compromising users' privacy call into question the current centralized model in which third-parties collect and control massive amounts of personal data. Blockchain has demonstrated that trusted and auditable computing is possible using a decentralized network of peers accompanied by a public ledger. Furthermore, Homomorphic Encryption (HE) guarantees confidentiality not only on the computation but also on the transmission, and storage processes. The synergy between Blockchain and HE is rapidly increasing in the computing environment.
This research proposes a privacy-enhancing distributed and secure protocol for data aggregation backboned by Blockchain and HE technologies. Blockchain acts as a distributed ledger which facilitates efficient data aggregation through a Smart Contract. On the top, HE will be used for data encryption allowing private aggregation operations. The theoretical description, potential applications, a suggested implementation and a performance analysis are presented to validate the proposed solution.This work has been partially supported by the Basque Country Government under the ELKARTEK program, project TRUSTIND (KK- 2020/00054). It has also been partially supported by the H2020 TERMINET project (GA 957406)
Boosting Perturbation-Based Iterative Algorithms to Compute the Median String
[Abstract] The most competitive heuristics for calculating the median string are those that use perturbation-based iterative algorithms. Given the complexity of this problem, which under many formulations is NP-hard, the computational cost involved in the exact solution is not affordable. In this work, the heuristic algorithms that solve this problem are addressed, emphasizing its initialization and the policy to order possible editing operations. Both factors have a significant weight in the solution of this problem. Initial string selection influences the algorithm’s speed of convergence, as does the criterion chosen to select the modification to be made in each iteration of the algorithm. To obtain the initial string, we use the median of a subset of the original dataset; to obtain this subset, we employ the Half Space Proximal (HSP) test to the median of the dataset. This test provides sufficient diversity within the members of the subset while at the same time fulfilling the centrality criterion. Similarly, we provide an analysis of the stop condition of the algorithm, improving its performance without substantially damaging the quality of the solution. To analyze the results of our experiments, we computed the execution time of each proposed modification of the algorithms, the number of computed editing distances, and the quality of the solution obtained. With these experiments, we empirically validated our proposal.This work was supported in part by the Comisión Nacional de Investigación Científica y Tecnológica - Programa de Formación de Capital Humano Avanzado (CONICYT-PCHA)/Doctorado Nacional/2014-63140074 through the Ph.D. Scholarship, in part by the European Union's Horizon 2020 under the Marie Sklodowska-Curie under Grant 690941, in part by the Millennium Institute for Foundational Research on Data (IMFD), and in part by the FONDECYT-CONICYT under Grant 1170497. The work of ÓSCAR PEDREIRA was supported in part by the Xunta de Galicia/FEDER-UE refs under Grant CSI ED431G/01 and Grant GRC: ED431C 2017/58, in part by the Office of the Vice President for Research and Postgraduate Studies of the Universidad Católica de Temuco, VIPUCT Project 2020EM-PS-08, and in part by the FEQUIP 2019-INRN-03 of the Universidad Católica de TemucoXunta de Galicia; ED431G/01Xunta de Galicia; ED431C 2017/58Chile. Comisión Nacional de Investigación Científica y Tecnológica; 2014-63140074Chile. Comisión Nacional de Investigación Científica y Tecnológica; 1170497Universidad Católica de Temuco (Chile); 2020EM-PS-08Universidad Católica de Temuco (Chile); 2019-INRN-0
- …